19 research outputs found
The LDBC Social Network Benchmark Interactive workload v2: A transactional graph query benchmark with deep delete operations
The LDBC Social Network Benchmark’s Interactive workload
captures an OLTP scenario operating on a correlated social network graph.
It consists of complex graph queries executed concurrently with a stream
of updates operation. Since its initial release in 2015, the Interactive
workload has become the de facto industry standard for benchmarking
transactional graph data management systems. As graph systems have
matured and the community’s understanding of graph processing features
has evolved, we initiated the renewal of this benchmark. This paper
describes the draft Interactive v2 workload with several new features:
delete operations, a cheapest path-finding query, support for larger data
sets, and a novel temporal parameter curation algorithm that ensures
stable runtimes for path queries
DuckPGQ: Bringing SQL/PGQ to DuckDB
In this research project, we investigate an alternative to the standard
cloud-centralized data architecture. Specifically, we aim to leave
part of application data under the control of the individual data
owners in conceptually decentralized personal data stores. Our
primary goal is to increase data minimization, i. e., enabling more
sensitive personal data to be under the control of its owners while
providing a straightforward and efficient framework for architects
to design data architectures that allow applications to run and their
data to be analyzed. To serve this purpose, the centralized part of
the schema contains aggregating views over this decentralized data.
We propose to design a declarative language that extends SQL, for
architects to specify at the schema level different kinds of tables:
decentralized, centralized, and replicated, as well as centralized
materialized views, and in addition, the sensitivity of decentralized
columns and their minimum granularity levels, when these end up
in centralized views. When users modify their personal data stores,
the changes need to be reflected in the centralized views while
ensuring privacy; this calls for the integration of cryptography
techniques in distributed materialized view maintenance. We finally
aim to implement this system, where the personal data stores could
either live in mobile devices or encrypted cloud storage, in order to
evaluate its performance properties experimentally.We demonstrate the most important new feature of SQL:2023, namely SQL/PGQ, which eases querying graphs using SQL by introducing new syntax for pattern matching and (shortest) path-finding. We show how support for SQL/PGQ can be integrated into an RDBMS, specifically in the DuckDB system, using an extension module called DuckPGQ. As such, we also demonstrate the use of the DuckDB extensibility mechanism, which allows us to add new functions, data types, operators, optimizer rules, storage systems, and even parsers to DuckDB. We also describe the new data structures and algorithms that the DuckPGQ module is based on, and how they are injected into SQL plans.
While the demonstrated DuckPGQ extension module is lean and efficient, we sketch a roadmap to (i) improve its performance through new algorithms (factorized and WCOJ) and better parallelism and (ii) extend its functionality to scenarios beyond SQL, e.g., building and analyzing Graph Neural Networks.</p
DuckPGQ: Efficient property graph queries in an analytical RDBMS
In the past decade, property graph databases have emerged as a
growing niche in data management. Many native graph systems
and query languages have been created, but the functionality and
performance still leave much room for improvement. The upcoming
SQL:2023 will introduce the Property Graph Queries (SQL/PGQ)
sub-language, giving relational systems the opportunity to standard-
ize graph queries, and provide mature graph query functionality.
We argue that (i) competent graph data systems must build on
all technology that makes up a state-of-the-art relational system,
(ii) the graph use case requires the addition to that of a many-
source/destination path-finding algorithm and compact graph rep-
resentation, and (iii) incites research in practical worst-case-optimal
joins and factorized query processing techniques.
We outline our design of DuckPGQ that follows this recipe,
by adding efficient SQL/PGQ support to the popular open-source
“embeddable analytics” relational database system DuckDB, also
originally developed at CWI. Our design aims at minimizing techni-
cal debt using an approach that relies on efficient vectorized UDFs.
We benchmark DuckPGQ showing encouraging performance and
scalability on large graph data sets, but also reinforcing the need
for future research under (iii)
LSQB: A large-scale subgraph query benchmark
We introduce LSQB, a new large-scale subgraph query benchmark. LSQB tests the performance of database management systems on an important class of subgraph queries overlooked by existing benchmarks. Matching a labelled structural graph pattern, referred to as subgraph matching, is the focus of LSQB. In relational terms, the benchmark tests DBMSs' join performance as a choke-point since subgraph matching is equivalent to multi-way joins between base Vertex and base Edge tables on ID attributes. The benchmark focuses on read-heavy workloads by relying on global queries which have been ignored by prior benchmarks. Global queries, also referred to as unseeded queries, are a type of queries that are only constrained by labels on the query vertices and edges. LSQB contains a total of nine queries and leverages the LDBC social network data generator for scalability. The benchmark gained both academic and industrial interest and is used internally by 5+ different vendors
Formalising openCypher Graph Queries in Relational Algebra
Graph database systems are increasingly adapted for storing and processing
heterogeneous network-like datasets. However, due to the novelty of such
systems, no standard data model or query language has yet emerged.
Consequently, migrating datasets or applications even between related
technologies often requires a large amount of manual work or ad-hoc solutions,
thus subjecting the users to the possibility of vendor lock-in. To avoid this
threat, vendors are working on supporting existing standard languages (e.g.
SQL) or creating standardised languages.
In this paper, we present a formal specification for openCypher, a high-level
declarative graph query language with an ongoing standardisation effort. We
introduce relational graph algebra, which extends relational operators by
adapting graph-specific operators and define a mapping from core openCypher
constructs to this algebra. We propose an algorithm that allows systematic
compilation of openCypher queries.Comment: ADBIS conference (21st European Conference on Advances in Databases
and Information Systems) The final publication is available at Springer via
https://doi.org/10.1007/978-3-319-66917-5_1
LAGraph: Linear algebra, network analysis libraries, and the study of graph algorithms
Graph algorithms can be expressed in terms of linear algebra. GraphBLAS is a library of low-level building blocks for such algorithms that targets algorithm developers. LAGraph builds on top of the GraphBLAS to target users of graph algorithms with high-level algorithms common in network analysis. In this paper, we describe the first release of the LAGraph library, the design decisions behind the library, and performance using the GAP benchmark suite. LAGraph, however, is much more than a library. It is also a project to document and analyze the full range of algorithms enabled by the GraphBLAS. To that end, we have developed a compact and intuitive notation for describing these algorithms. In this paper, we present that notation with examples from the GAP benchmark suite
LAGraph: Linear algebra, network analysis libraries, and the study of graph algorithms
Graph algorithms can be expressed in terms of linear algebra. GraphBLAS is a library of low-level building blocks for such algorithms that targets algorithm developers. LAGraph builds on top of the GraphBLAS to target users of graph algorithms with high-level algorithms common in network analysis. In this paper, we describe the first release of the LAGraph library, the design decisions behind the library, and performance using the GAP benchmark suite. LAGraph, however, is much more than a library. It is also a project to document and analyze the full range of algorithms enabled by the GraphBLAS. To that end, we have developed a compact and intuitive notation for describing these algorithms. In this paper, we present that notation with examples from the GAP benchmark suite
The LDBC social network benchmark: Business intelligence workload
The Social Network Benchmark’s Business Intelligence workload (SNB BI) is a comprehensive graph OLAP benchmark targeting analytical data systems capable of supporting graph workloads. This paper marks the finalization of almost a decade of research in academia and industry via the Linked Data Benchmark Council (LDBC). SNB BI advances the state-of-the art in synthetic and scalable analytical database benchmarks in many aspects. Its base is a sophisticated data generator, implemented on a scalable distributed infrastructure, that produces a social graph with small-world phenomena, whose value properties follow skewed and correlated distributions and where values correlate with structure. This is a temporal graph where all nodes and edges follow lifespan-based rules with temporal skew enabling realistic and consistent temporal inserts and (recursive) deletes. The query workload exploiting this skew and correlation is based on LDBC’s “choke point”-driven design methodology and will entice technical and scientific improvements in future (graph) database systems. SNB BI includes the first adoption of “parameter curation” in an analytical benchmark, a technique that ensures stable runtimes of query variants across different parameter values. Two performance metrics characterize peak single-query performance (power) and sustained concurrent query throughput. To demonstrate the portability of the benchmark, we present experimental results on a relational and a graph DBMS. Note that these do not constitute an official LDBC Benchmark Result – only audited results can use this trademarked term